A Nearest Neighbors Algorithm for Strings
نویسنده
چکیده
The algorithm is discussed in the context of one of the practical applications: aligning DNA reads to a reference genome. An implementation of the algorithm is shown to align about 106 reads per CPU minute and about 108 base-pairs per CPU minute (human DNA reads). This implementation is compared to the popular software packages Bowtie and BWA, and is shown to be over 5−10 times faster in some applications.
منابع مشابه
A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملA New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection
Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...
متن کاملDiagnosis of Heart Disease Using Binary Grasshopper Optimization Algorithm and K-Nearest Neighbors
Introduction: The heart is one of the main organs of the human body, and its unhealthiness is an important factor in human mortality. Heart disease may be asymptomatic, but medical tests can predict and diagnose it. Diagnosis of heart disease requires extensive experience of specialist physicians. The aim of this study is to help physicians diagnose heart disease based on hybrid Binary Grasshop...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملSearching for Features using a Genetic Algorithm
Automatic classification of words use abstract representations of lexical items. The representations are usually not easily derived from the data available (strings of letters). This is a core problem in nearest neighbor methods. This article describes research towards a genetic algorithm for inventing features of relevance for automatic word classification. The GA attempts to optimize a repres...
متن کامل